{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# S3VectorStore Integration\n",
    "\n",
    "This is a vector store integration for LlamaIndex that uses S3Vectors.\n",
    "\n",
    "[Find out more about S3Vectors](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-getting-started.html).\n",
    "\n",
    "This notebook will assume that you have already created a S3 vector bucket (and possibly also an index).\n",
    "\n",
    "## Installation\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-vector-stores-s3 llama-index-embeddings-openai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Usage\n",
    "\n",
    "### Creating the vector store object\n",
    "\n",
    "You can create a new vector index in an existing S3 bucket.\n",
    "\n",
    "If you don't have S3 credentials configured in your environment, you can provide a boto3 session with credentials."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.vector_stores.s3 import S3VectorStore\n",
    "import boto3\n",
    "\n",
    "vector_store = S3VectorStore.create_index_from_bucket(\n",
    "    # S3 bucket name or ARN\n",
    "    bucket_name_or_arn=\"test-bucket\",\n",
    "    # Name for the new index\n",
    "    index_name=\"my-index\",\n",
    "    # Vector dimension (e.g., 1536 for OpenAI embeddings)\n",
    "    dimension=1536,\n",
    "    # Distance metric: \"cosine\", \"euclidean\", etc.\n",
    "    distance_metric=\"cosine\",\n",
    "    # Data type for vectors\n",
    "    data_type=\"float32\",\n",
    "    # Batch size for inserting vectors (max 500)\n",
    "    insert_batch_size=500,\n",
    "    # Metadata keys that won't be filterable\n",
    "    non_filterable_metadata_keys=[\"custom_field\"],\n",
    "    # Optional: provide a boto3 session for custom AWS configuration\n",
    "    # sync_session=boto3.Session(...)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or, you can use an existing vector index in an existing S3 bucket."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.vector_stores.s3 import S3VectorStore\n",
    "import boto3\n",
    "\n",
    "vector_store = S3VectorStore(\n",
    "    # Index name or ARN\n",
    "    index_name_or_arn=\"my-index\",\n",
    "    # S3 bucket name or ARN\n",
    "    bucket_name_or_arn=\"test-bucket\",\n",
    "    # Data type for vectors (must match index)\n",
    "    data_type=\"float32\",\n",
    "    # Distance metric (must match index)\n",
    "    distance_metric=\"cosine\",\n",
    "    # Batch size for inserting vectors (max 500)\n",
    "    insert_batch_size=500,\n",
    "    # Optional: specify metadata field containing text if you already have a populated index\n",
    "    # text_field=\"content\",\n",
    "    # Optional: provide a boto3 session for custom AWS configuration\n",
    "    # sync_session=boto3.Session(...),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using the vector store with an index\n",
    "\n",
    "Once you have a vector store, you can use it with an index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import (\n",
    "    VectorStoreIndex,\n",
    "    StorageContext,\n",
    "    SimpleDirectoryReader,\n",
    "    Document,\n",
    ")\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "\n",
    "# Load some data\n",
    "# documents = SimpleDirectoryReader(\"data\").load_data()\n",
    "\n",
    "# Or create new documents\n",
    "documents = [\n",
    "    Document(text=\"Hello, world!\", metadata={\"key\": \"1\"}),\n",
    "    Document(text=\"Hello, world! 2\", metadata={\"key\": \"2\"}),\n",
    "]\n",
    "\n",
    "# Create a new index\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents,\n",
    "    storage_context=StorageContext.from_defaults(vector_store=vector_store),\n",
    "    # optional: set the embed model\n",
    "    embed_model=OpenAIEmbedding(model=\"text-embedding-3-small\", api_key=\"...\"),\n",
    ")\n",
    "\n",
    "# Or reload from an existing index\n",
    "# index = VectorStoreIndex.from_vector_store(\n",
    "#     vector_store=vector_store,\n",
    "#     # optional: set the embed model\n",
    "#     # embed_model=embed_model,\n",
    "# )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, world!\n"
     ]
    }
   ],
   "source": [
    "nodes = index.as_retriever(similarity_top_k=1).retrieve(\"Hello, world!\")\n",
    "print(nodes[0].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also use filters to help you retrieve the correct nodes!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, world! 2\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core.vector_stores.types import (\n",
    "    MetadataFilters,\n",
    "    MetadataFilter,\n",
    "    FilterOperator,\n",
    "    FilterCondition,\n",
    ")\n",
    "\n",
    "nodes = index.as_retriever(\n",
    "    similarity_top_k=2,\n",
    "    filters=MetadataFilters(\n",
    "        filters=[\n",
    "            MetadataFilter(\n",
    "                key=\"key\",\n",
    "                value=\"2\",\n",
    "                operator=FilterOperator.EQ,\n",
    "            ),\n",
    "        ],\n",
    "        condition=FilterCondition.AND,\n",
    "    ),\n",
    ").retrieve(\"Hello, world!\")\n",
    "\n",
    "print(nodes[0].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using the vector store directly\n",
    "\n",
    "You can also use the vector store directly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, world!\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core.schema import TextNode\n",
    "from llama_index.core.vector_stores.types import VectorStoreQuery\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "\n",
    "# embed nodes\n",
    "nodes = [\n",
    "    TextNode(text=\"Hello, world!\"),\n",
    "    TextNode(text=\"Hello, world! 2\"),\n",
    "]\n",
    "\n",
    "embed_model = OpenAIEmbedding(model=\"text-embedding-3-small\", api_key=\"...\")\n",
    "embeddings = embed_model.get_text_embedding_batch([n.text for n in nodes])\n",
    "for node, embedding in zip(nodes, embeddings):\n",
    "    node.embedding = embedding\n",
    "\n",
    "# add nodes to the vector store\n",
    "vector_store.add(nodes)\n",
    "\n",
    "# query the vector store\n",
    "query = VectorStoreQuery(\n",
    "    query_embedding=embed_model.get_query_embedding(\"Hello, world!\"),\n",
    "    similarity_top_k=2,\n",
    ")\n",
    "results = vector_store.query(query)\n",
    "print(results.nodes[0].text)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
